Tolerance of Effectiveness Measures to Relevance Judging Errors
نویسندگان
چکیده
Crowdsourcing relevance judgments for test collection construction is attractive because the practice has the possibility of being more affordable than hiring high quality assessors. A problem faced by all crowdsourced judgments – even judgments formed from the consensus of multiple workers – is that there will be differences in the judgments compared to the judgments produced by high quality assessors. For two TREC test collections, we simulated errors in sets of judgments and then measured the effect of these errors on effectiveness measures. We found that some measures appear to be more tolerant of errors than others. We also found that to achieve high rank correlation in the ranking of retrieval systems requires conservative judgments for average precision (AP) and nDCG, while precision at rank 10 requires neutral judging behavior. Conservative judging avoids mistakenly judging non-relevant documents as relevant at the cost of judging some relevant documents as non-relevant. In addition, we found that while conservative judging behavior maximizes rank correlation for AP and nDCG, to minimize the error in the measures’ values requires more liberal behavior. Depending on the nature of a set of crowdsourced judgments, the judgments may be more suitable with some effectiveness measures than others, and the use of some effectiveness measures will require higher levels of judgment quality than others.
منابع مشابه
Comparing the effectiveness of pharmacotherapy (paroxetine), modular cognitive- behavioral therapy and their combination in distress tolerance, ambiguity tolerance and meta-worry in female university students with generalized anxiety disorder
Introduction In the last two decades, the interest in understanding the nature of generalized anxiety disorder (GAD) and the development of effective treatment methods for GAD has increased among psychologists. This study aimed to compare the effectiveness of pharmacotherapy, modular cognitive-behavioral therapy and their combination in distress tolerance, ambiguity tolerance and meta-worr...
متن کاملA Review of Medication Errors in Iran: Sources, Underreporting Reasons and Preventive Measures
Medication error (ME) is the most common preventable cause of adverse drug events which negatively affects patient safety. Inadequate, low-quality studies plus wide estimation variations in ME from developing countries including Iran, decreases the reliability of ME evaluations. To clarify sources, underreporting reasons and preventive measures of MEs, we reviewed Iran current available literat...
متن کاملThe Effectiveness of Marital Relationship Enrichment Training Based on Choice Theory, on Distress Tolerance of Women Recovered from Addiction
Aims: Most people in the community believe that substance abuse is masculine, while women are progressing alongside men in this area, with statistics indicating an increasing number of infected women. The purpose of this study was to determine the effectiveness of enriching couples relationships on tolerance of distressed women who were drug abusers Methods & Materials: This is a quasi-experim...
متن کاملA Review of Medication Errors in Iran: Sources, Underreporting Reasons and Preventive Measures
Medication error (ME) is the most common preventable cause of adverse drug events which negatively affects patient safety. Inadequate, low-quality studies plus wide estimation variations in ME from developing countries including Iran, decreases the reliability of ME evaluations. To clarify sources, underreporting reasons and preventive measures of MEs, we reviewed Iran current available literat...
متن کاملEffect of Firm Life Cycle Theory on the relevance of Risk Measures
Risk phenomenon is one of the key characteristics of decision making in the fields of investment, issues associated with financial markets, and various economic activities. The present study was an attempt to evaluate the impact of different periods of life cycle of companies on the relevance of risk measures of companies. In this study, the collected data have been analyzed in three stages. Fi...
متن کامل